Data Replication

Source: Michael Haslam
Source: Michael Haslam

Introduction

Background

Cultural knowledge and traditions may affect individual cognition in wild populations of primates. In this study, the authors compare the performance of two wild populations of bearded capuchin monkeys (Sapajus libidinosus) with two distinct tool use repertoires in a novel probing task. Only the population that already exhibited the use of probing tools was able to solve the foraging problem, suggesting that group cultural traditions significantly affect individual problem-solving in these populations.

Methodology

The researchers studied two populations of bearded capuchin monkeys living in northeastern Brazil:

  1. The Fazenda Boa Vista (FBV) group, which customarily uses stone tools to crack nuts, but not probe tools. (n = 16)

  2. The Serra da Capivara National Park (SCNP) group, which customarily uses a very broad toolkit, including stick probing tools. (n = 23)

The researchers set up a novel probing task using a transparent box and sugarcane molasses that was only obtainable by inserting a probe through a slit in the top of the box. For this replication, the relevant data they recorded within the open-access dataset is the following:

  • Number of times visited
  • Total length of time visited
  • Average time spent per visit
  • Number of probe events
  • Number of successes
  • Number of probe tools used

My replication

I will be replicating all the statistical analyses ran in this paper. Most of them are descriptive:

  • The mean and standard deviation of the total number of visits for both groups
  • The mean, standard deviation, median, and range for the length of direct interaction(s) with the task for both groups

There is one inferential statistic:

  • A mystery Mann-Whitney U test, otherwise known as a two-sample Wilcoxon rank sum test. More on this later.

I will also replicate both figures from this paper:

  • A bar graph depicting the number of sticks used per individual
  • A bar graph depicting proportion of successful probing by each individual

Preparation

Load packages:

> library(curl)
## Using libcurl 8.10.1 with Schannel
> library(ggplot2)

Loading in the data:

> f <- curl("https://raw.githubusercontent.com/clairezng/clairezh-AN588-Replication/main/DATASET_CARDOSO&OTTONI.csv")
> d <- read.csv(f, header = TRUE, stringsAsFactors = FALSE)
> # checking whether the data loaded correctly
> head(d) # looks fine!
##    ID POPULATION  SEX GROUP.AGE NUMVISIT TIME.VISIT Mean_TIMEVISIT FINGERS
## 1 TOR       SCNP MALE     ADULT       53       7715         145.57       0
## 2 BEI       SCNP MALE     ADULT       35       9974         284.97       1
## 3 ZAN       SCNP MALE     ADULT       63      10663         168.78       5
## 4 NIC       SCNP MALE     ADULT       42      11793         280.79       0
## 5 ZEN       SCNP MALE     ADULT       38       6513         171.39       2
## 6 CLA       SCNP MALE     ADULT       27       5570         206.30       0
##   PROBE.EVENT SUCCESSFUL NUMBER.OF.PROBE.TOOLS
## 1         489        457                    70
## 2         656        616                    52
## 3         655        632                    91
## 4         797        735                    89
## 5         548        518                    36
## 6         342        330                    29

Dividing populations

Before we start, it’ll also be helpful to partition the two populations (SCNP & FBV) into separate datasets in R, for ease of calculation later on.

> SCNP <- subset(d, POPULATION == "SCNP")
> FBV <- subset(d, POPULATION == "FBV")
> SCNP
##     ID POPULATION    SEX GROUP.AGE NUMVISIT TIME.VISIT Mean_TIMEVISIT FINGERS
## 1  TOR       SCNP   MALE     ADULT       53       7715         145.57       0
## 2  BEI       SCNP   MALE     ADULT       35       9974         284.97       1
## 3  ZAN       SCNP   MALE     ADULT       63      10663         168.78       5
## 4  NIC       SCNP   MALE     ADULT       42      11793         280.79       0
## 5  ZEN       SCNP   MALE     ADULT       38       6513         171.39       2
## 6  CLA       SCNP   MALE     ADULT       27       5570         206.30       0
## 7  BLP       SCNP   MALE       SUB       34       9010         265.00       0
## 8  CAP       SCNP   MALE  JUVENILE       47       9504         202.21       2
## 9  LIM       SCNP   MALE  JUVENILE       27       6219         230.33      11
## 10 COR       SCNP   MALE  JUVENILE       53       9659         182.25       7
## 11 VOL       SCNP   MALE  JUVENILE       64       9201         143.77      10
## 12 PAD       SCNP   MALE  JUVENILE       59       8175         138.56      10
## 13 DES       SCNP   MALE  JUVENILE       66       7281         110.32      13
## 14 CIN       SCNP   MALE  JUVENILE       77       7043          91.47       0
## 15 GOR       SCNP FEMALE     ADULT       56       7040         125.71     376
## 16 MAC       SCNP FEMALE     ADULT       69       8730         126.52     268
## 17 BEM       SCNP FEMALE     ADULT       58       7850         135.34     598
## 18 CAN       SCNP FEMALE     ADULT       27       4804         177.93     430
## 19 NIN       SCNP FEMALE     ADULT       17        977          57.47       7
## 20 LIC       SCNP FEMALE     ADULT       30       3514         117.13      36
## 21 ALI       SCNP FEMALE     ADULT       42       5131         122.17     318
## 22 VES       SCNP FEMALE     ADULT       16       2373         148.31      93
## 23 BAT       SCNP FEMALE  JUVENILE       67       7944         118.57     292
##    PROBE.EVENT SUCCESSFUL NUMBER.OF.PROBE.TOOLS
## 1          489        457                    70
## 2          656        616                    52
## 3          655        632                    91
## 4          797        735                    89
## 5          548        518                    36
## 6          342        330                    29
## 7          635        601                    31
## 8          784        766                    78
## 9          278        258                    42
## 10         805        726                   103
## 11         262        156                    25
## 12         150         37                    44
## 13          22          4                    12
## 14           0          0                     0
## 15           0         NA                    NA
## 16           0         NA                    NA
## 17           0         NA                    NA
## 18           0         NA                    NA
## 19           0         NA                    NA
## 20           0         NA                    NA
## 21           0         NA                    NA
## 22           0         NA                    NA
## 23           0         NA                    NA
> FBV # looks okay!
##     ID POPULATION    SEX GROUP.AGE NUMVISIT TIME.VISIT Mean_TIMEVISIT FINGERS
## 24 TEI        FBV   MALE     ADULT       20       1162          58.10      13
## 25 MSN        FBV   MALE     ADULT       10        625          62.50       0
## 26 JTB        FBV   MALE     ADULT       38       1647          43.34       5
## 27 CAT        FBV   MALE       SUB       48       3169          66.02       5
## 28 COC        FBV   MALE  JUVENILE       38       3475          91.45      74
## 29 CNG        FBV   MALE  JUVENILE       11        757          68.82       7
## 30 PAT        FBV   MALE  JUVENILE       24       2507         104.46      32
## 31 DIT        FBV FEMALE     ADULT       59       3012          51.05       2
## 32 CHU        FBV FEMALE     ADULT       50       4336          86.72       6
## 33 PSS        FBV FEMALE     ADULT       18       2601         129.17      10
## 34 AMR        FBV FEMALE     ADULT        7        827         118.14       0
## 35 DOR        FBV FEMALE  JUVENILE        7        215          30.71       0
## 36 PAM        FBV FEMALE  JUVENILE       20       2325         130.05      13
## 37 PAS        FBV FEMALE  JUVENILE       18        948          52.67       0
##    PROBE.EVENT SUCCESSFUL NUMBER.OF.PROBE.TOOLS
## 24           0         NA                    NA
## 25           0         NA                    NA
## 26           0         NA                    NA
## 27           0         NA                    NA
## 28           0         NA                    NA
## 29           0         NA                    NA
## 30           0         NA                    NA
## 31           0         NA                    NA
## 32           0         NA                    NA
## 33           0         NA                    NA
## 34           0         NA                    NA
## 35           0         NA                    NA
## 36           0         NA                    NA
## 37           0         NA                    NA

Descriptive statistics

Number of visits

The researchers stated:

  • SCNP capuchins visited the boxes at a mean = 213 visits/day; SD = 55; total individual visits = 1067.
  • FBV capuchins visited the boxes at a mean = 29 visits/day; SD = 12; total individual visits = 376

Calculating mean for SCNP visits:

> SCNPtotal <- sum(SCNP$NUMVISIT)
> SCNPtotal
## [1] 1067
> SCNPmean <- SCNPtotal/5 # exposed to the boxes for 5 days
> SCNPmean
## [1] 213.4

Everything looks fine! The mean denotes the total number of visits from all individuals divided by the number of days they were exposed to the box.

  • Do note that I was not able to use the mean() function: the researchers are calculating average total visits per day, not average visits per individual. I was puzzled over this for a good amount of time.

Standard deviation is a bit more tricky - because we’re calculating average total visits per day, and not average number of visits per individual, the sd() function will not work here.

  • I’m going to write a sample standard deviation function, but to be quite honest, I don’t think I have the right data to calculate the correct SD.
> sample_sd <- function(x, mean) {
+   n <- length(x)
+   mean_x <- mean
+   sqdiff <- (x - mean_x)^2
+   variance <- sum(sqdiff) / (n-1)
+   sd <- sqrt(variance)
+   return(sd)
+ }
> sample_sd(SCNP$NUMVISIT, SCNPmean)
## [1] 171.6772

After messing with this extensively, I’ve realized the only way I can calculate the correct standard deviation is if I have access to the number of total visits recorded during each day the SCNP group was exposed to the task, which is data the researchers did not include.

Looking at the sum for the FBV population:

> FBVtotal <- sum(FBV$NUMVISIT)
> FBVtotal
## [1] 368

Immediately, we can see the total number of visits for the FBV group (368) does not match what the researchers say it is (376). I’m choosing to chalk it up to a reduced dataset, but will be calculating the mean and standard deviation with the dataset I have access to.

Calculating mean:

> FBVmean <- FBVtotal/13
> FBVmean 
## [1] 28.30769

If the total visits added up correctly, their mean would be correct (i.e., 376/13 = 28.92)

Running into the same issue for standard deviation: I don’t have access to the actual number of visits per day for the FBV population, so I can’t replicate their number.

> sample_sd(FBV$NUMVISIT, FBVmean)
## [1] 17.32072

I struggle to comprehend the purpose of calculating the mean number of visits per day for the entire population, not by individual. I understand that there is a very large difference between the means, but without some kind of statistical comparison between the two, I’m afraid it’s not entirely effective.

Time spent visiting

For time data, the researchers stated the following: “FBV: mean time = 75 s, s.d. = 75 s, median = 48 s, range = 648 s; SCNP: mean time = 156 s, s.d. = 176 s, median = 92 s, range = 1398 s”

Here is what I think that means:

Mean time = the average of each individual’s mean time per visit

  • Both standard deviation and median will be calculated from individual mean time per visit values

Range = the longest total time visiting - the shortest total time visiting among all individuals in each population

(Spoiler alert: I’m very incorrect in these assumptions!)

Starting with the mean for the FBV population:

> mean(FBV$Mean_TIMEVISIT)
## [1] 78.08571

Immediately, I can tell this isn’t how the researchers calculated the mean. Trying a different method:

> FBVtotal.length <- sum(FBV$TIME.VISIT)
> FBVtotal.visits <- sum(FBV$NUMVISIT)
> FBVmean.visit <- (FBVtotal.length/FBVtotal.visits)
> FBVmean.visit
## [1] 75.0163

It seems that instead of pulling from the “mean time per visits” data column, the researchers have instead summed up the total time visiting for every individual and divided it by the total number of visits to find the average length of time per visit.

I suppose standard deviation could potentially be calculated using the mean time per visit of individuals and the mean value we derived earlier?

> sample_sd(FBV$Mean_TIMEVISIT, FBVmean.visit)
## [1] 32.45978

This isn’t working, so I’m going to re-calculate the “Mean_TIMEVISIT” value by hand, and use that instead. I don’t think it’ll make much of a difference, but it’s worth a try:

> FBV$average.visit <- FBV$TIME.VISIT/FBV$NUMVISIT
> # trying again
> sample_sd(FBV$average.visit, FBVmean.visit)
## [1] 33.12426

Unfortunately, nowhere close to the 75 we’re looking for. it’s close to the value we already have, so I don’t think miscalculation of the mean time value is the issue. Once again, because I don’t have access to the data denoting the length of each visit, I’m unable to calculate SD.

Trying median now:

> median(FBV$Mean_TIMEVISIT)
## [1] 67.42

The reduced dataset poses the same issue here.

Calculating range:

> max(FBV$TIME.VISIT)-min(FBV$TIME.VISIT)
## [1] 4121

This clearly is incorrect - I suspect it’s the range of times spent on each individual visit, which I don’t have access to.

With our experience from the FBV population, hopefully calculating values for the SCNP population will be easier:

Mean:

> SCNPtotal.length <- sum(SCNP$TIME.VISIT)
> SCNPtotal.visits <- sum(SCNP$NUMVISIT)
> SCNPmean.visit <- (SCNPtotal.length/SCNPtotal.visits)
> SCNPmean.visit
## [1] 156.2165

The mean looks right. I’ll run through the rest of the descriptive statistics for the SCNP group, but they’re incorrect for the same reasons as before.

Standard deviation, median, range:

> sample_sd(SCNP$Mean_TIMEVISIT, SCNPmean.visit)
## [1] 59.78304
> median(SCNP$Mean_TIMEVISIT)
## [1] 145.57
> max(SCNP$TIME.VISIT)-min(SCNP$TIME.VISIT)
## [1] 10816

Inferential Statistics

The Mann-Whitney U Test

The only reference to the Mann-Whitney U test (or Wilcoxon rank-sum test) the researchers conduct is a section of text in Table 1 that states: “Mann-Whitney test: Z = -4541, p < 0.0001, two-tailed.”

Because of the lack of description, and the fact that the objective of this paper is to demonstrate different tool use repertoires, my immediate inclination is that the test is being used to evaluate the frequency of probe use, denoted by the “PROBE.EVENT” variable.

> mw.probes <- wilcox.test(SCNP$PROBE.EVENT, FBV$PROBE.EVENT, paired = FALSE) # using the wilcox.test() function
> mw.probes
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SCNP$PROBE.EVENT and FBV$PROBE.EVENT
## W = 252, p-value = 0.0008902
## alternative hypothesis: true location shift is not equal to 0

The p-value here (0.0008902) does not match up with the p < 0.0001 they provide, so I need to look at other variables.

In the table, they list values for length of the direct engagement with the task (“TIME.VISIT”) and the mean time of visits (“Mean_TIMEVISIT”), so I’ll try it for that too:

> mw.time <- wilcox.test(SCNP$TIME.VISIT, FBV$TIME.VISIT, paired = FALSE)
> mw.time # p-value = 2.987e-07, which could definitely be the p < 0.0001 value they're talking about
## 
##  Wilcoxon rank sum exact test
## 
## data:  SCNP$TIME.VISIT and FBV$TIME.VISIT
## W = 306, p-value = 2.987e-07
## alternative hypothesis: true location shift is not equal to 0
> mw.meantime <- wilcox.test(SCNP$Mean_TIMEVISIT, FBV$Mean_TIMEVISIT, paired = FALSE)
> mw.meantime # okay, this spits out a p-value = 5.623e-06, which could also be the p < 0.0001 value they list
## 
##  Wilcoxon rank sum exact test
## 
## data:  SCNP$Mean_TIMEVISIT and FBV$Mean_TIMEVISIT
## W = 294, p-value = 5.623e-06
## alternative hypothesis: true location shift is not equal to 0

These definitely seem more promising; both p-values are less than 0.0001.

Finding Z-score

Next, I need to figure out this Z-score of -4541, which seems very far out of the range of possible Z values. There are a couple steps to this:

  • Find the Mann-Whitney U values for both tests
  • From the U values, approximate a Z-score using a normal distribution

Luckily, the W values R spits out from the wilcox.test() function are equivalent to the U values. Unluckily, for a reason unknown to me, R has given me the larger of the two W (or U) values obtained from the Wilcoxon-Mann-Whitney test.

After some finagling, I realized I just need to swap the order of my variables. Using the SCNP population - which has significantly higher values - as the reference level x[i] returns a higher W value, as R computes U as “the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i].

> mw.time2 <- wilcox.test(FBV$TIME.VISIT, SCNP$TIME.VISIT, paired = FALSE)
> mw.time2
## 
##  Wilcoxon rank sum exact test
## 
## data:  FBV$TIME.VISIT and SCNP$TIME.VISIT
## W = 16, p-value = 2.987e-07
## alternative hypothesis: true location shift is not equal to 0
> mw.meantime2 <- wilcox.test(FBV$Mean_TIMEVISIT, SCNP$Mean_TIMEVISIT, paired = FALSE)
> mw.meantime2
## 
##  Wilcoxon rank sum exact test
## 
## data:  FBV$Mean_TIMEVISIT and SCNP$Mean_TIMEVISIT
## W = 28, p-value = 5.623e-06
## alternative hypothesis: true location shift is not equal to 0

That looks a lot more reasonable! U = 16 when comparing total interaction time, and U = 28 when comparing mean time per visit.

There’s conflict in the literature about at what sample size the distribution of the U-statistic can be assumed to be approximately normal:

Assuming \(n_{1}\) represents the smaller of the two independent samples, and \(n_{2}\) represents the larger, Sidney Siegel states in Nonparametric Statistics for the Behavioral Sciences:

As \(n_{1}\) and \(n_{2}\) increase in size, the sampling distribution of U rapidly approaches the normal distribution … That is, when \(n_{2}\) > 20 we may determine the significance of an observed value of U by [calculating the Z-score] which is practically normally distributed (1956).

In our dataset, \(n_{2}\) = 23, meaning we can approximate the normal distribution under this parameter. However, the difference between the two sample sizes should be considered when determining whether to assume normality.

Regardless, because the researchers provide a Z-score, this is how we would calculate it:

\[z = \frac{U - \mu_{U}}{\sigma_{U}}\]

Where the mean of U, \(\mu_{U} = \frac{n_{1}n_{2}}{2}\)

and the standard deviation of U, \(\sigma_{U} = \sqrt{\frac{n_{1}n_{2}(n_{1}+n_{2}+1)}{12}}\)

I’m going to write a function to calculate the Z-score at least, because I need to run two different values (and may have to deal with more later):

  • Note that I tried to use the qnorm() function to find the Z-score. Just trust me that it did not work.

  • I’m also going to use a continuity correction of 0.5, because the Mann-Whitney U value hypothetically exists in a discrete distribution and we are approximating it to a continuous one (the normal distribution). Additionally, my sample size is relatively small.

> mw.z <- function(U, n1, n2, correction = TRUE) {
+   mean_U <- (n1*n2) / 2
+   sd_U <- sqrt((n1*n2*(n1+n2+1))/12)
+   correction <- if (correction) 0.5 else 0
+   z <- (U - mean_U + correction) / sd_U # because U < mean_U, applying a correction of +0.5
+   print(z)
+ }
> mw.z(16, length(FBV$TIME.VISIT), length(SCNP$TIME.VISIT))
## [1] -4.52521
> mw.z(28, length(FBV$Mean_TIMEVISIT), length(SCNP$Mean_TIMEVISIT))
## [1] -4.149414

These are very close to the value from the paper (minus the decimal point). Maybe I’ll take out the continuity correction:

> mw.z(16, length(FBV$TIME.VISIT), length(SCNP$TIME.VISIT), correction = FALSE)
## [1] -4.540868
> mw.z(28, length(FBV$Mean_TIMEVISIT), length(SCNP$Mean_TIMEVISIT), correction = FALSE)
## [1] -4.165072

I think Z = -4.540868 is what we’re looking for. If you round it to the thousandth, Z = -4.541, which matches up with the Z = -4541 the researchers state in the paper, if we ignore the missing decimal point. I’m choosing to assume that’s a typo in the text, meaning the mystery Wilcoxon-Mann-Whitney test they conduct is looking at the difference between the total length of direct interaction(s) between the FBV and SCNP populations.

Apart from the glaring typo, I don’t understand why the Z-score approximation was necessary; providing the U or W statistic, p-value, and type of test (i.e., two-tailed) is perfectly sufficient to portray the significance of their results, as well as ensure replicability. Additionally, although not crucial to demonstrate the significance, I found it odd that the researchers didn’t apply a continuity correction when approximating to a normal distribution.

Recreating Figures

Moving on to recreating Figure One and Figure Two. They are pretty simple bar graphs.

  • As a note, they are only looking males in the SCNP group, so I’ll be pruning the data accordingly.
  • They also removed the individual “CIN”, presumably because he had 0 probe events despite being in the SCNP group
> names(SCNP)
##  [1] "ID"                    "POPULATION"            "SEX"                  
##  [4] "GROUP.AGE"             "NUMVISIT"              "TIME.VISIT"           
##  [7] "Mean_TIMEVISIT"        "FINGERS"               "PROBE.EVENT"          
## [10] "SUCCESSFUL"            "NUMBER.OF.PROBE.TOOLS"
> SCNP_males <- subset(SCNP, SEX == "MALE") # narrowing data down to only males
> SCNP_males <- SCNP_males[SCNP_males$ID !="CIN", ] # removing CIN
> SCNP_males # checking this
##     ID POPULATION  SEX GROUP.AGE NUMVISIT TIME.VISIT Mean_TIMEVISIT FINGERS
## 1  TOR       SCNP MALE     ADULT       53       7715         145.57       0
## 2  BEI       SCNP MALE     ADULT       35       9974         284.97       1
## 3  ZAN       SCNP MALE     ADULT       63      10663         168.78       5
## 4  NIC       SCNP MALE     ADULT       42      11793         280.79       0
## 5  ZEN       SCNP MALE     ADULT       38       6513         171.39       2
## 6  CLA       SCNP MALE     ADULT       27       5570         206.30       0
## 7  BLP       SCNP MALE       SUB       34       9010         265.00       0
## 8  CAP       SCNP MALE  JUVENILE       47       9504         202.21       2
## 9  LIM       SCNP MALE  JUVENILE       27       6219         230.33      11
## 10 COR       SCNP MALE  JUVENILE       53       9659         182.25       7
## 11 VOL       SCNP MALE  JUVENILE       64       9201         143.77      10
## 12 PAD       SCNP MALE  JUVENILE       59       8175         138.56      10
## 13 DES       SCNP MALE  JUVENILE       66       7281         110.32      13
##    PROBE.EVENT SUCCESSFUL NUMBER.OF.PROBE.TOOLS
## 1          489        457                    70
## 2          656        616                    52
## 3          655        632                    91
## 4          797        735                    89
## 5          548        518                    36
## 6          342        330                    29
## 7          635        601                    31
## 8          784        766                    78
## 9          278        258                    42
## 10         805        726                   103
## 11         262        156                    25
## 12         150         37                    44
## 13          22          4                    12

I’m also going to reorder individuals because I want to match the order they’re listed in within the paper.

> SCNP_males$ID <- factor(SCNP_males$ID, levels = c("TOR", "BEI", "ZAN", "NIC", "ZEN", "CLA", "BLP", "CAP", "LIM", "COR", "VOL", "PAD", "DES"))

Figure 1

> ggplot(data=SCNP_males, aes(x=ID, y=NUMBER.OF.PROBE.TOOLS)) +
+   geom_bar(stat="identity", fill="red", width = 0.5) + 
+   labs(x = element_blank(), y = "no. probe tools", caption = "Figure 1. Number of sticks used by each monkey of the SCNP group.") +
+   theme(plot.caption = element_text(hjust = 0)) +
+   scale_y_continuous(breaks = seq(0, 100, by = 10), expand=expansion(mult=c(0,0.01))) 

And here is the original Figure 1 from the paper:

This makes the fact that the figures are inaccurate MUCH more noticeable - the researchers’ graphs definitely aren’t to scale, and it seems to outright plot some values incorrectly? For example, the individual “DES” used 12 probe tools, and the bar for him sits well under 10. Personally, I also would’ve kept a label for the x-axis.

Figure 2

First, I have to calculate the proportion of successful probing events for each individual:

> SCNP_males$PROBE.SUCCESS <- SCNP_males$SUCCESSFUL/SCNP_males$PROBE.EVENT
> SCNP_males$PROBE.SUCCESS # looks fine
##  [1] 0.9345603 0.9390244 0.9648855 0.9222083 0.9452555 0.9649123 0.9464567
##  [8] 0.9770408 0.9280576 0.9018634 0.5954198 0.2466667 0.1818182

Moving onto plotting:

> ggplot(data=SCNP_males, aes(x=ID, y=PROBE.SUCCESS)) +
+   geom_bar(stat="identity", fill="navy", width = 0.5) + 
+   labs(x = element_blank(), y = "proportion of successful probing", caption = "Figure 2. Proportions of successful probing by each monkey of the SCNP group.") + 
+   theme(plot.caption = element_text(hjust = 0)) +
+   scale_y_continuous(breaks = seq(0, 1.0, by = 0.2), expand = expansion(mult = c(0,0.05))) 

And here is the original Figure 2:

I adjusted my y-axis intervals - I don’t understand why the researchers used uneven intervals (0, 0.3, 0.5, 0.8, 1), and they certainly aren’t to scale. Putting the two figures side-by-side, we can once again see that some values are plotted incorrectly by the authors of the original paper. I doubt that R, Excel, or other plotting software would output graphs that are not-to-scale, so I wonder if they used Photoshop or something similar to create them.

Conclusions

After doing this assignment, I have a few thoughts:

  • The lack of statistical clarity (especially what the Mann-Whitney test was measuring) didn’t significantly affect the findings, as the difference in tool use regimens was so extreme between the two groups. However, it was frustrating that their dataset was so reduced it became impossible to replicate most of the summary statistics, and I think that defeats the purpose of having open-access data.

  • This ambiguity also meant I spent a significant amount of time on trial-and-error attempts at replicating the statistics, which could have been easily avoided.

  • The figures from the original paper portray incorrect values and are not to scale, and I can’t figure out why that is. Figure 1 does a good job of illustrating the variation in tool use by different SCNP individuals, but I don’t see the purpose of Figure 2 - all it portrays is that most males succeeded in their probing events, which I don’t think necessitates visual representation. (I will admit this point is rather nitpicky!)

  • I know this is out of the scope of this particular paper, but I found the sexual dimorphism in level of tool use striking. No SCNP females succeeded in the probing task, despite the fact that the SCNP population customarily uses probing tools, and I wish the researchers explored the potential mechanisms/implications of this further.